Parallel k Nearest Neighbor Graph Construction Using Tree-Based Data Structures
نویسندگان
چکیده
Construction of a nearest neighbor graph is often a necessary step in many machine learning applications. However, constructing such a graph is computationally expensive, especially when the data is of high dimensionality. In this work, we focus on the use of two tree structures, k-d trees and ball trees, to implement nearest neighbor graph construction. We present parallel implementations of nearest neighbor graph construction using such tree structures, with parallelism provided by OpenMP and the Galois framework. Our results show that kd-trees are faster when the number of dimensions is small (N >> 2), ball trees on the other hand scale well will the number of dimensions. Our Galois implementation with maxspread was faster than OpenMP for data that satisfied the kd-tree requirement, irrespective of number of threads. OpenMP ball trees were faster than Galois kd-tree for datasets with large dimension, irrespective of number of threads.
منابع مشابه
EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph
Approximate nearest neighbor (ANN) search is a fundamental problem in many areas of data mining, machine learning and computer vision. The performance of traditional hierarchical structure (tree) based methods decreases as the dimensionality of data grows, while hashing based methods usually lack efficiency in practice. Recently, the graph based methods have drawn considerable attention. The ma...
متن کاملParallel Construction of k-Nearest Neighbor Graphs for Point Clouds
We present a parallel algorithm for k-nearest neighbor graph construction that uses Morton ordering. Experiments show that our approach has the following advantages over existing methods: (1) Faster construction of k-nearest neighbor graphs in practice on multi-core machines. (2) Less space usage. (3) Better cache efficiency. (4) Ability to handle large data sets. (5) Ease of parallelization an...
متن کاملUsing the Mutual k-Nearest Neighbor Graphs for Semi-supervised Classification on Natural Language Data
The first step in graph-based semi-supervised classification is to construct a graph from input data. While the k-nearest neighbor graphs have been the de facto standard method of graph construction, this paper advocates using the less well-known mutual k-nearest neighbor graphs for high-dimensional natural language data. To compare the performance of these two graph construction methods, we ru...
متن کاملk-NN Graph Construction: a Generic Online Approach
Nearest neighbor search and k-nearest neighbor graph construction are two fundamental issues arise from many disciplines such as information retrieval, data-mining, machine learning and computer vision. Despite continuous efforts have been taken in the last several decades, these two issues remain challenging. They become more and more imminent given the big data emerges in various fields and h...
متن کاملLazy Classifiers Using P-trees
Lazy classifiers store all of the training samples and do not build a classifier until a new sample needs to be classified. It differs from eager classifiers, such as decision tree induction, which build a general model (such as a decision tree) before receiving new samples. K-nearest neighbor (KNN) classification is a typical lazy classifier. Given a set of training data, a knearest neighbor c...
متن کامل